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The maximum segment sum problem is to compute, given a list of integers, the largest of the sums 
of the contiguous segments of that list. This problem specification maps directly onto a cubic-time 
algorithm; however, there is a very elegant linear-time solution too. The problem is a classic exercise 
in the mathematics of program construction, illustrating important principles such as calculational 
development, pointfree reasoning, algebraic structure, and datatype-genericity. Here, we take a side- 
ways look at the datatype-generic version of the problem in terms of monadic functional program- 
ming, instead of the traditional relational approach; the presentation is tutorial in style, and leavened 
with exercises for the reader. 

1 Introduction 

Domain-specific languages are one approach to the general challenge of raising the level of abstraction 
in constructing software systems. Rather than making use of the same general-purpose tools for all 
domains of discourse, one identifies a particular domain of interest, and fashions some tools specifically 
to embody the abstractions relevant to that domain. The intention is that common concerns within that 
domain are abstracted away within the domain-specific tools, so that they can be dealt with once and for 
all rather than being considered over and over again for each development within the domain. 

Accepting the premise of domain-specific over general-purpose tools naturally leads to an explosion 
in the number of tools in the programmer's toolbox — and consequently, greater pressure on the tool 
designer, who needs powerful meta-tools to support the lightweight design of new domain-specific ab- 
stractions for each new domain. Language design can no longer be the preserve of large committees and 
long gestation periods; it must be democratized and streamlined, so that individual developers can aspire 
to toolsmithery, crafting their own languages to address their own problems. 

Perhaps the most powerful meta-tool for the aspiring toolsmith is a programming language expres- 
sive enough to host domain-specific embedded languages flSl . That is, rather than designing a new 
domain-specific language from scratch, with specialized syntax and a customized syntax-aware editor, 
a dedicated parser, an optimization engine, an interpreter or compiler, debugging and profiling systems, 
and so on, one simply writes a library within the host language. This can be constraining — one has to 
accept the host language's syntax and semantics, which might not entirely match the domain — but it is 
very lightweight, because one can exploit all the existing infrastructure rather than having to reinvent it. 

Essentially, the requirement on the host language is that it provides the right features for capturing 
new abstractions — things like strong typing, higher-order functions, modules, classes, data abstraction, 
datatype-genericity, and so on. If the toolsmith can formalize a property of their domain, the host lan- 
guage should allow them to express that formalization within the language. One might say that a suffi- 
ciently expressive host language is in fact a domain-specific language for domain-specific languages. 
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Given a suitably expressive host language, the toolsmith designs a domain-specific language as a 
library — of combinators, or classes, or modules, or whatever the appropriate abstraction mechanism is 
in that language. Typically, this consists of a collection of constructs (functions, methods, datatypes) 
together with a collection of laws defining an equational theory for those constructs. The pretty -printing 
libraries of Hughes |[T6l and Wadler |[24ll are a good example; but so is the relational algebra that underlies 
SQL [91. 

This tutorial presents an exercise in reasoning with a collection of combinators, representative of 
the kinds of reasoning that can be done with the constructs and laws of any domain-specific embedded 
language. We will take Haskell ll22l as our host language, since it provides many of the right features for 
expressing domain-specific embedded languages. However, Haskell is still not perfect, so we will take 
a somewhat abstract view of it, mixing true Haskell syntax with some mathematical idealizations — our 
point is the equational reasoning, not the language in which it is expressed. Section |9] provides a brief 
summary of our notation, and there are some exercises with solutions in Section [TOl 

2 Maximum segment sum 

The particular problem we will be considering is a classic exercise in the mathematics of program con- 
struction, namely that of deriving a linear-time algorithm for the maximum segment sum problem, based 
on Homer's Rule. The problem was popularized in Jon Bentley's Programming Pearls column [11 in 
Communications of the ACM (and in the subsequent book [2]), but I learnt about it from my DPhil super- 
visor Richard Bird's lecture notes on the Theory of Lists [4J and Constructive Functional Programming 
121 and his paper Algebraic Identities for Program Calculation ||6l, which he was working on around 
the time I started my doctorate. It seems like I'm not the only one for whom the problem is a favourite, 
because it has since become a bit of a cliche among program calculators; but that won't stop me revisiting 
it. 

The original problem is as follows. Given a list of numbers (say, a possibly empty list of integers), 
find the largest of the sums of the contiguous segments of that list. In Haskell, this specification could be 
written like so: 

mss : : [Integer] — > Integer 

mss = maximum ■ map sum ■ segs 

where segs computes the contiguous segments of a list: 

segs, inits, tails :: [a] — ?• [[oc]] 
segs = concat ■ map inits ■ tails 

tails = foldrf[[]] where /xx5'5' = (x : head xss) : xss 

inits = foldrg[[]] where gx xss = [] : map{x ■.)xss 

and sum computes the sum of a list of integers, and maximum the maximum of a nonempty list of integers: 

sum, maximum :: [Integer] -^Integer 
sum = foldr (+) 

maximum = foldr ^ (U) 

(Here, U denotes binary maximum.) This specification is executable, but takes cubic time; the problem 
is to do better. 

We can get quite a long way just using standard properties of map, inits, and so on. It is straightfor- 
ward (see Exercise [ill to calculate that 

mss = maximum ■ map (maximum ■ map sum ■ inits) ■ tails 
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If we can write maximum ■ map sum ■ inits in the form foldr he, then the map of this can be fused with the 
tails to yield scanrhe; this observation is known as the Scan Lemma. Moreover, if h takes constant time, 
then this gives a linear-time algorithm for mss. 

The crucial observation is based on Homer's Rule for evaluation of polynomials, which is the first 
important thing you learn in numerical computing — I was literally taught it in secondary school, in my 
sixth-year classes in mathematics. Here is its familiar form: 

n-l 

^ aix' = aQ + a\x + a2X^ -\ h^n-ix"^^ = aQ+x{ai +x{a2 H l-xa„_i)) 

(=0 

but the essence of the rule is about sums of products (see Exercise |2ll: 

n-l (-1 

52 ri = 1 + "0 + "0"! H 1- W0"1 • • • Un-2 = l+Mo(l+'^l(H 1" Un-l)) 

i=0 j=0 

Expressed in Haskell, this is captured by the equation 

sum ■ map product ■ inits = foldr (©) e where e = l;u(Bz = e + uxz 

(where product = foldr (x) 1 computes the product of a list of integers). 

But Homer's Rule is not restricted to sums and products; the essential properties are that addition 
and multiplication are associative, that multiplication has a unit, and that multiplication distributes over 
addition. This the algebraic structure of a semiring (but without needing commutativity of addition). 
In particular, the so-called tropical semiring on the integers, in which "addition" is binary maximum 
and "multiplication" is integer addition, satisfies the requirements. So for the maximum segment sum 
problem, we get 

maximum ■ map sum ■ inits = foldr (0) e where e = 0;u(Bz = eLi{u + z) 
Moreover, © takes constant time, so this gives a linear- time algorithm for mss (see Exercise |3]l. 

3 Tail segments, datatype-generically 

About a decade after the initial "theory of lists" work on the maximum segment sum problem, Richard 
Bird, Oege de Moor, and Paul Hoogendijk came up with a datatype-generic version of the problem [S). 
It's fairly clear what "maximum" and "sum" mean generically, but not so clear what "segment" means 
for nonlinear datatypes; the point of their paper is basically to resolve that issue. 

Recalling the definition of segs in terms of inits and tails, we see that it would suffice to develop 
datatype-generic notions of "initial segment" and "tail segment". One fruitful perspective is given in 
Bird & co's paper: a "tail segment" of a cons list is just a subterm of that list, and an "initial segment" is 
the list but with some tail (that is, some subterm) replaced with the empty structure. 

So, representing a generic "tail" of a data structure is easy: it's a data structure of the same type, 
and a subterm of the term denoting the original structure. A datatype-generic definition of tails is a little 
trickier, though. For lists, you can see it as follows: every node of the original list is labelled with the 
subterm of the original list rooted at that node. I find this a helpful observation, because it explains why 
the tails of a list is one element longer than the list itself: a list with n elements has « + 1 nodes (n conses 
and a nil), and each of those nodes gets labelled with one of the n+l subterms of the list. Indeed, tails 
ought morally to take a possibly empty list and retum a non-empty list of possibly empty lists — there are 
two different datatypes involved. Similarly, if one wants the "tails" of a data structure of a type in which 
some nodes have no labels (such as leaf-labelled trees, or indeed such as the "nil" constructor of lists). 
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one needs a variant of the datatype providing labels at those positions. Also, for a data structure in which 
some nodes have multiple labels, or in which there are different types of labels, one needs a variant for 
which every node has precisely one label. 

Bird & CO call this the labelled variant of the original datatype; if the original is a polymorphic 
datatype Ta = /x(Fo:) for some binary shape functor F, then the labelled variant is La = /x(G a) where 
Gaj8 = axFl j3 — whatever a-labels F may or may not have specified are ignored, and precisely one 
ot-label per node is provided. Given this insight, it is straightforward to define a datatype-generic variant 
subterms of the tails function: 

subtermsf: =foldf{inQ ■fork{in^ ■ Fidroot, F ! id)) :: Ta — )■ L(Ta) 

where root = fst ■ in^^ = foldQfst :: La — > a returns the root label of a labelled data structure, and 
\a = (Afl . 0) :: a — > 1 is the unique arrow to the unit type. (Informally, having computed the tree of 
subterms for each child of a node, we make the tree of subterms for the node itself by assembling all 
the child trees with the label for this node; the label should be the whole structure rooted at this node, 
which can be reconstructed from the roots of the child trees.) What's more, there's a datatype-generic 
scan lemma too: 

scan^ :: (Fa j3 j8) Ta Lj8 
scanf / = L (foldp f) ■ subterms^ 

= foldp{inQ ■fork{f ■ fidroot, F ! id)) 

(Again, the label for each node can be constructed from the root labels of each of the child trees.) In 
fact, subterms and scan are paramorphisms |[T9i . and can also be nicely written coinductively as well as 
inductively |[20]| . 

4 Initial segments, datatype-generically 

What about a datatype-generic "initial segment"? As suggested above, that's obtained from the original 
data structure by replacing some subterms with the empty structure. Here I think Bird & co sell them- 
selves a little short, because they insist that the datatype T supports empty structures, which is to say, that 
F is of the form F a j3 = 1 + F' a j3 for some F'. This isn't necessary: for an arbitrary F, we can easily 
manufacture the appropriate datatype U of "data structures in which some subterms may be replaced by 
empty", by defining U a = /i(H a) where Haj8 = l + Fa/3. 

As with subterms, the datatype-generic version of inits is a bit trickier — and this time, the special 
case of lists is misleading. You might think that because a list has just as many initial segments as it 
does tail segments, so the labelled variant ought to suffice just as well here too. But this doesn't work for 
non-linear data structures such as trees — in general, there are many more "initial" segments than "tail" 
segments (because one can make independent choices about replacing subterms with the empty structure 
in each child), and they don't align themselves conveniently with the nodes of the original structure. 

The approach I prefer here is just to use a collection type to hold the "initial segments"; that is, a 
monad. This could be the monad of finite lists, or of finite bags, or of finite sets — we will defer until 
later the discussion about precisely which monad, and write simply M . That the monad corresponds to 
a collection class amounts to it supporting a "union" operator (l+)) :: M a x M a ^ M a for combining 
two collections (append, bag union, and set union, respectively, for lists, bags, and sets), and an "empty" 
collection :: M a as the unit of tt), both of which the join of the monad should distribute over ifTTll : 

joinfd = 

join (x l+l y ) = join x tt) join y 



Jeremy Gibbons 



5 



(Some authors also add the axiom join (M (Aa . 0);c) =0, making in some sense both a left and a right 
zero of composition.) You can think of a computation of type a — )■ M j3 in two equivalent ways: as a 
nondeterministic mapping from an a to one of many — or indeed, no — possible j8s, or as a deterministic 
function from an a to the collection of all such jSs. The choice of monad distinguishes different flavours 
of nondeterminism; for example, the finite bag monad models nondeterminism in which the multiplicity 
of computations yielding the same result is significant, whereas with the finite set monad the multiplicity 
is not significant. 

Now we can implement the datatype-generic version of inits by nondeterministically pruning a data 
structure by arbitrarily replacing some subterms with the empty structure; or equivalently, by generating 
the collection of all such prunings. 

prune =/oWp(M/?iH -optNothing- MJust- di) :: /i(Fa) — )• l\/l(/x(Ha)) 

Here, opt supplies a new alternative for a nondeterministic computation: 

opt ax = retuma^x 

and ^2 :: (Foc)M ^ M(Fa) distributes the shape functor F over the monad M (which can be defined for all 
traversable functors Fa — we'll say more about this in Section |7]l. Informally, once you have computed 
all possible ways of pruning each of the children of a node, a pruning of the node itself is formed either 
as Just some node assembled from arbitrarily pruned children, or Nothing for the empty structure. 



5 Horner's Rule, datatype-generically 

As we've seen, the essential property behind Horner's Rule is one of distributivity, for example of product 
over sum. In the datatype-generic case, we will model this as follows. We are given an (Fa)-algebra 
(j3,/), and a M-algebra (j8,^); you might think of these as "datatype-generic product" and "collection 
sum", respectively. Then there are two different methods of computing a j3 result from an Fa(Mj8) 
structure: we can either distribute the Fa structure over the collection(s) of j3s, compute the "product" 
/ of each structure, and then compute the "sum" k of the resulting products; or we can "sum" each 
collection, then compute the "product" of the resulting structure, as illustrated in the following diagram. 

Fa(Mj3) ^ -M(Fai3) — — 

Firf/t k 

FaP J -/3 

Distributivity of "product" over "sum" is the property that these two different methods agree. For exam- 
ple, with / :: FNN — > N adding all the naturals in an F-structure, and k:: M N ^ N finding the maximum 
of a collection of naturals (returning for the empty collection), the diagram commutes (see Exercise[8]l. 
(To match up with the rest of the story, we have presented distributivity in terms of a bifunctor F, al- 
though the first parameter a plays no role. We could just have well have used a unary functor, dropping 
the a, and changing the distributor to 5 :: FM — > MF.) 

Note that {I5,k) is required to be an algebra for the monad M. This means that it is not only an 
algebra for M as a functor (namely, of type M /3 — )• j8), but also it should respect the extra structure of 
the monad: k ■ return = id and k - join = k-Mk. For the special case of monads of collections, these 
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amount to what were called reductions in the old Theory of Lists HI work — functions k of the form ® / 
for binary operator © :: j8 x j3 — > jS, distributing over union: © / (x^y) = (ffi/x) © (© /y) (see Exercise|9]l- 
A consequence of this distributivity property is that © has to satisfy all the properties that 1+) does — for 
example, if tt) is associative, then so must © be, and so on, and in particular, since 1+) has a unit 0, then © 
too must have a unit :: jS, and ©/0 = is forced (see Exercise [TOl). 

Recall that we modelled an "initial segment" of a structure of type /x(Fa) as being of type /i(H a), 
where HajS = 1 + Faj3. We need to generalize "product" to work on this extended structure, which 
is to say, we need to specify the value b of the "product" of the empty structure too. Then we have 
maybe b / :: H ajS -> jS, so thatfold^imaybebf) :: ;U(H a) j8. 

The datatype-generic version of Homer's Rule is then about computing the "sum" of the "products" 
of each of the "initial segments" of a data structure: 

©/ • M(foldy^{maybe b f)) - prune 

We can use fold fusion to show that this composition can be computed as a single folA, fold^{{b®) •/), 
given the distributivity property ©/ • M / • ^2 = / • F /<i (©/) above (see Exercise [121) . Curiously, it doesn't 
seem to matter what value is chosen for b. 

We're nearly there. We start with the traversable shape bifunctor F, a collection monad M, and a 
distributive law ^2 :: (Fa)M — )• M(Fa). We are given an (Fa)-algebra (j3,/), an additional element 
b :: j8, and a M-algebra (j8,©/), such that / and © take constant time and / distributes over ©/ in the 
sense above. Then we can calculate (see Exercise [T3]l that 

©/ ■ M(fold^{maybe b f)) ■ segs = ©/ -contentsi ■ scanf:{{b(B) •/) 

where 

segs = join ■ M prune ■ contents^ ■ subterms :: /X (F a) — t- M (/i (H a)) 

and where contents^ :: L-> M computes the contents of an L-structure (which, like 82, can be defined 
using the traversability of F). The scan can be computed in linear time, because its body takes constant 
time; moreover, the "sum" ©/ and contents can also be computed in linear time (indeed, they can even 
be fused into a single pass). 

For example, with / : : F Z Z ^ Z adding all the integers in an F-structure, ft = : : Z, and © : : Z x Z — )• 
Z returning the greater of two integers, we get a datatype-generic version of the Unear-time maximum 
segment sum algorithm. 

6 Distributivity reconsidered 

There's a bit of hand-waving in Section|5]to justify the claim that the commuting diagram there really is 
a kind of distributivity. What does it have to do with the familiar equation a © (ft © c) = (a © ft) © (a © c) 
capturing distributivity of one binary operator © over another, ©? 

Recall that ^2 :: (Fa)M->M(Fa) distributes the shape functor F over the monad M in its second 
argument; this is the form of distribution over "effects" that crops up in the datatype-generic Maximum 
Segment Sum problem. More generally, this works for any idiom M; this will be important below. 

Generalizing in another direction, one might think of distributing over an idiom in both arguments 
of the bifunctor, via an operator 5 : F • (M x M) -> M • F, which is to say, 5p :: F (Mj3) (Mj8) M(Fj8), 
natural in the j8. This is the bidist method of the Bitraversable subclass of Bifunctor that Bruno Oliveira 
and I used in our paper |[T4l on the ITERATOR pattern; informally, it requires just that F has a finite 
ordered sequence of "element positions". Given 8, one can define &i = 8 ■ f pure id. 
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That traversability (or equivalently, distributivity over effects) for a bifunctor F is definable for any 
idiom, not just any monad, means that one can also conveniently define an operator contents^ : H — )• List 
for any traversable unary functor H. This is because the constant functor K[p] is an idiom: the pure 
method returns the empty list, and idiomatic application appends two lists. Then one can define 

contents H = 5 H wrap 

where wrap makes a singleton list. For a traversable bifunctor F, we define contents^ = contents f/^^ where 
A is the diagonal functor; that is, contents^ :: Fj8j3 [j3], natural in the p. (No constant functor is a 
monad, except in trivial categories, so this convenient definition of contents doesn't work monadically. 
Of course, one can use a writer monad, but this isn't quite so convenient, because an additional step is 
needed to extract the output.) 

One important axiom of 5, suggested by Ondfej Rypacek |[23l . is that it should be "natural in the con- 
tents": it should leave shape unchanged, and depend on contents only up to the extent of their ordering. 
Say that a natural transformation : F -> G between traversable functors F and G "preserves contents" 
if contentSQ • <p = contents^. Then, in the case of unary functors, the formalization of "naturality in the 
contents" requires 5 to respect content-preserving (j) : 

5g • = M0 • 5f : TM MG <;= contents^ ■ (j) = contents^ 
In particular, contents^: : F-> List itself preserves contents, and so we expect 

^List ■ contents^ = y\ {contents^) ■ 5f 
to hold. 

Happily, the same generic operation contents^ provides a datatype-generic means to "fold" over the 
elements of an F-structure. Given a binary operator (g) :: j8 x j8 — > j8 and an initial value :: j8, we can 
define an (FjS) -algebra (j8,/) — that is, a function / :: Fj8 j3 j8 — by 

/ =foldr {'^)b- contents f 

This is a slight specialization of the presentation of the datatype-generic MSS problem; there we had 
/ :: FttjS — > j8. The specialization arises because we are hoping to define such an / given a homoge- 
neous binary operator (g). On the other hand, the introduction of the initial value b is no specialization, 
as we needed such a value for the "product" of an empty "segment" anyway. This "generic folding" 
construction is just what is provided by Ross Paterson's Data.Foldable Haskell Ubrary ||2TI . 



7 Reducing distributivity 

The general principle about traversals underlying Rypacek's paper [23 1 on labelling data structures is that 
it is often helpful to reduce a general problem about traversal over arbitrary datatypes to a more specific 
one about lists, exploiting the "naturality in contents" property of traversal. We'll use that tactic for the 
distributivity property in the datatype-generic version Horner's Rule. 
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Consider the following diagram. 



52 



F/3(M^)— ^ 



Fid((B/) 



return id 



(1) 

F(Mj8)(M/3) 



M(F/3^) 




McontentSf 



(6) 




U{foldr{®)h) 



(7) 



©/ 



(3) 



foldr (®) b 



f 

The perimeter is just the commuting diagram given in Section[5] — the diagram we have to justify. Face (1) 
is the definition of ^2 in terms of 5. Faces (2) and (3) are the expansion of / as generic folding of an 
F-structure. Face (4) follows from ©/ being an M -algebra, and hence being a left-inverse of return. 
Face (5) is an instance of the naturality property of contents^ : FA -> List. Face (6) is the property that 
5 respects the contents-preserving transformation contents^. Therefore, the whole diagram commutes if 
Face (7) does — so let's focus on Face (7): 

^List .,roi M(/oWr((g!)fo) 



Mj3 



List(©/) 



foldr {®)b 

Demonstrating that this diagram commutes is not too difficult, because both sides turn out to be list folds. 
Around the left and bottom edges, we have a folA foldr ((g)) b after a map List (© /), which automatically 
fuses to foldr {Q)b, where is defined hy xQa = {Q)/x) ^a, or, pointlessly, (©) = (©) • (©/) x id. 
Around the top and right edges we have the composition ©/ • M (foldr {(g)) b) • ^List- If we can write Slisi 
as an instance of foldr, we can then use the fusion law for foldr to prove that this composition equals 
foldr (O) b (see Exercise [T5]). 

In fact, there are various equivalent ways of writing ^List as an instance of foldr. The definition given 
by Conor McBride and Ross Paterson in their original paper on idioms |[T8l looked like the identity 
function, but with added idiomness: 

Susti] = pure[] 

5\_\st {mb : mbs) = pure{:)® mb ® ^List mbs 
In the special case that the idiom is a monad, it can be written in terms of UftMQ (aka return) and liftM2- 

5ust[] = liftM^W 

5\_\st {mb : mbs) = liftM2{:)mb {5i\stmbs) 
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But we'll use a third equivalent definition: 

SustU = return [] 

Sust {fnb : mbs) = M ( : ) {cp (mb , 5]_\st nibs) ) 

where 

cp :: Ma X Mj8 ^ M(a X j8) 

cp{x^y) = join(}A{Xa .M{a,)y)x) 

That is, 

^List = foldr -cp) {return []) 

In the use of fold fusion in demonstrating distributivity for lists ("Exercise [T5l). we are naturally lead to a 
distributivity condition 

e/-M(®)-c/7 = ((g))-(e/) X (e/) 

for cp. This in turn follows from corresponding distributivity properties for collections (see Exercise [16]), 

e/-M(a®) = (a(g))-e/ 
®/-U{®b) = {®b)-®/ 

which can finally be discharged by induction over the size of the (finite!) collections (see Exercise [17]) . 



8 Conclusion 

As the title of their paper f3l suggests, Bird & co carried out their development using the relational 
approach set out in the Algebra of Programming book [ 8 ] ; for example, their version of prune is a 
relation between data structures and their prunings, rather than being a function that takes a structure 
and returns the collection of all its prunings. There's a well-known isomorphism between relations and 
set- valued functions, so their relational approach roughly looks equivalent to the monadic one taken here. 

I've known their paper well for over a decade (I made essential use of the "labelled variant" con- 
struction in my own papers on generic downwards accumulations |[TOl [TH '). but I've only just noticed 
that although they discuss the maximum segment sum problem, they don't discuss problems based on 
other semirings, such as the obvious one of integers with addition and multiplication — which is, after all, 
the origin of Homer's Rule. Why not? It turns out that the relational approach doesn't work in that case! 

There's a hidden condition in the calculation, which relates back to our earlier comment about which 
collection monad — finite sets, finite bags, lists, etc — to use. When M is the set monad, distribution over 
choice {®/{x^y) = (©/x) © {®/y)) — and consequently the condition ©/ -optb = {b®) • ©/ that we used 
in proving Homer's Rule — requires © to be idempotent, because l±) itself is idempotent; but addition is 
not idempotent. For exactly this reason, the distributivity property does not in fact hold for addition with 
the set monad. But everything does work out with the bag monad, for which l±) is not idempotent. The 
bag monad models a flavour of nondeterminism in which multiplicity of results matters — as it does for 
the sum-of-products instance of the problem, when two copies of the same segment should be treated 
differently from just one copy. Similarly, if the order of results matters — if, for example, we were looking 
for the "first" solution — then we would have to use the list monad rather than bags or sets. The moral of 
the story is that the relational approach is programming with just one monad, namely the set monad; if 
that monad doesn't capture your effects faithfully, you're stuck. 

(On the other hand, there are aspects of the problem that work much better relationally than they do 
functionally. We have carefully used maximum only for a linear order, namely the usual ordering of the 
integers. A non-antisymmetric order is more awkward monadically, because there need not be a unique 
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maximal value. For example, it is not so easy to compute "the" segment with maximal sum, because 
there may be several such. We could refine the ordering by sum on segments to make it once more a 
partial order, perhaps breaking ties lexically; but we have to take care to preserve the right distributivity 
properties. Relationally, however, finding the maximal elements of a finite collection under a partial 
order works out perfectly straightforwardly. We can try the same trick of turning the relation "maximal 
under a partial order" into the collection-valued function "all maxima under a partial order", but the 
equivalent trick on the ordering itself — turning the relation "<" into the collection- valued function "all 
values less than this one" — runs into problems by taking us outside the world of finite nondeterminism.) 
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9 Appendix: Notation 

For the benefit of those not fluent in Haskell and the Algebra of Programming approach, this appendix 
presents some basic notations. For a more thorough introduction, see the books by Richard Bird t?] [H 
and my lecture notes on "origami programming" lfT2l . 

Types: Our programs are typed; the statement "x :: a" declares that variable or expression x has type a. 
We use product types a x j3 (with morphism /or^ :: (a — ^ j3) x (a — ^ 7) — )■ (a — ^ j8 x 7)), sum 
types a + j8, and function types a — )• j3. We assume throughout that types represent sets, and 
functions are total. 

Functions: Function application is usually denoted by juxtaposition, "fx", and is left-associative and 
tightest-binding. Function composition is backwards, so {/ ■ g)x = f (gx). 

Operators: It is often convenient to write binary operators in infix notation; this makes many algebraic 
equations more perspicuous. We use sections (a©) and {®b) for partially applied binary operators, 
so that {a®)b = a®b = {®b)a. In contrast to Haskell, we consider binary operators uncurried; 
for example, (+) w'Lx'L^'L. 

Lists: We use the Haskell syntax "[a]" for a list type, "[]" for the empty list, "a : x" for cons, 

for append, and "[1,2,3]" for a hst constant. The fold foldr :: {a x p ^ [5) ^ p ^ [a] ^ p is 
ubiquitous; it has the universal property 



h= foldr fe h[] = e A h- {:) = f ■ id x h 
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and as a special case of this, the fusion law 

h -foldrfe =foldrf' e <^ he = e' /\ h- f = f ■ id xh 
The function map :: (a j3) ^> [a] [/3] is an instance, via mapf =foldr{{:) ■fxid)[]. So is 
scanr, which computes the fold of every tail of a list: 

scanr :: (a x j8 — ^ jS) -> jS [a] — > [jS] 

scanr fe = foldrh [e\ where ha {b :x) = fab : {b :x) 

We also use the variant/oWri/(x-H- [a]) =foldrfax on non-empty lists. 

Functors: Datatypes are modelled as functors, which are operations on both types and functions; so for 
F a functor, Fa is a type whenever a is, and if / :: a — j3 then F/ :: Fa ^ Fj3. Moreover, F 
respects the compositional structure of functions, preserving identity {fid a = id fa) and compo- 
sition (F(/-g) = F/- Fg). For example. List is a functor, with List a = [a] and List/ = mapf. 
We generalize this also to bifunctors, which are binary operators functorial in each argument; for 
example, we will see the bifunctor Laj8 = l + axj8 below, as the "shape functor" for lists. 

Naturality: Polymorphic functions are modelled as natural transformations between functors. A natural 
transformation : F -> G is a family of functions (^a F a — > G a, one for each a, coherent in the 
sense of being related by the naturality condition Gh- <j)a = <j>p -fh whenever h :: a ^ fi. 

Datatype-genericity: Datatype-generic programming is expressed in terms of parametrization by a 
functor. In particular, for a large class of bifunctors F (including all those built from constants 
and the identity using sums and products — the polynomial bifunctors), we can form a kind of least 
fixed point Ta = /x(Fa) of F in its second argument, giving an inductive datatype. It is a "fixed 
point" in the sense that T a ~ Fa(Ta); so List a = /i(La), where L is the shape functor for lists 
defined above. We sometimes use Haskell-style datatype definitions, which conveniently name the 
constructors too: 

data List a = Nil \ Cons (a, List a) 

Algebras: An F-algebra is a pair (a,/) such that / :: Fa — ^ a. A homomorphism between F-algebras 
(a,/) and (j8,g) is a function ft :: a — >■ j3 such that h-f = g- Fh. One half of the isomorphism 
by which an inductive datatype is a fixed point is given by the constructor inf :: Fa (Ta) — > Ta, 
through which (Ta,/«F) forms an (F a)-algebra. The datatype is the "least" fixed point in the 
sense that there is a unique homomorphism to any other (Fa)-algebra (jS,/); we say that (Ta,/nF) 
is the initial (Fa)-algebra. We write /oWp/ for that unique homomorphism; its uniqueness is 
captured in the universal property 

h=foldpf <^ h-inf = f-fh 
Monads: A monad M is a functor with two additional natural transformations, a multiplication join : 
MM M and a unit return : Id -> M (where Id is the identity functor), that satisfy three laws: 

join ■ return = id 

join ■ M return = id 

join ■ M join = join -join 
Collection types such as finite lists, bags, and sets form monads; in each case, return yields a 
singleton collection, and join unions a collection of collections into a collection. Another monad 
we will use is Haskell's "maybe" datatype and associated morphism 

data Maybe a = Nothing \ Just a 

maybe e f Nothing = e 
maybe e f {Just a) = fa 
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for which return = Just and join = maybe Nothing id. An algebra for a monad M is an M -algebra 
(a,/) for M as a functor, satisfying the extra conditions 

/ • return = id 
f-join = f-Mf 

Idioms: An idiom M is a functor with two additional natural transformations, whose components are 
pure^ :: a — > M a and ®a,j} " M (a j3) x M a M jS, satisfying four laws: 

pure id®u = u 

pure{-)®u®v®w = u®{v®w) 

puref®purea = pure (fa) 

u® pure a = pure {Xf .f a) ®u 

Any monad induces an idiom; so does any constant functor Ka, provided that there is a monoidal 
structure on a. 

10 Appendix: Exercises 

1. (See page|2]) Calculate that 

mss = maximum ■ map [maximum ■ map sum ■ inits) ■ tails 

just using the definitions of mss, inits, tails, together with (i) distributivity of map over function 
composition, (ii) naturality of concat, that is, mapf ■ concat = concat ■ map (mapf), and (iii) that 
maximum is a list homomorphism, that is, maximum ■ concat = maximum ■ map maximum. 

Solution: 

mss 

= { definition of mss } 

maximum ■ map sum ■ segs 
= { definition of segs } 

maximum ■ map sum ■ concat ■ map inits ■ tails 
= { naturality: mapf ■ concat = concat ■ map (mapf) } 

maximum ■ concat ■ map (map sum) ■ map inits ■ tails 
= { homomorphisms: maximum ■ concat = maximum ■ map maximum } 

maximum ■ map maximum ■ map (map sum) ■ map inits ■ tails 
= { functors } 

maximum ■ map (maximum ■ map sum ■ inits) ■ tails 

2. (See page [3]) Use the sum-of -products version of Horner's Rule to prove the more familiar poly- 
nomial version. 

Solution: In the case that all the a, are non-zero, we have 

«— 1 n— 1 i— 1 

^ aix' = aoY, ri'^i+i-^/^i 

i=0 i=0 7=0 

In general, we have to skip the terms when a, = 0, but that works out fine; for 
example, when ai = but ao,a2,ai, 0, we have 
ao + aix + a2X^ +a^x^ = ao(l + "0 + wo"i) 
where uq = a2X^ /a^ and mi = aT,x/a2. 
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3. (See page|3]) Hand-simulate the execution of the linear-time algorithm for mss 
mss =foldr (©) e where e = 0;u®z = eU{u + z) 
on the list [4, —5,6, —3,2,0, —4,5, —6,5]. Do you understand how it works? 
Solution: 



foldr 1 


©) 


e[4, -5,6, -3,2,0, -4,5, -6 


5] 


4©(- 


-5( 


B (6 (-3 0(2 ©(00 (-4® 


(50(-60(50O))))))))) 


4©(- 


-5( 


0(6 (-3 0(2 0(00 (-40 


(50(-605)))))))) 


4©(- 


-5( 


0(6 (-3 0(2 0(00 (-40 


(5©0))))))) 


4©(- 


-5( 


0(6 (-3 0(2 0(00 (-40 


5)))))) 


4©(- 


-5( 


B(60(-30(20(O01))))) 




4©(- 


-5( 


0(60(-30(20l)))) 




4©(- 


-5( 


0(60(-303))) 




4©(- 


-5( 


0(6 00)) 




4©(- 


-5( 


B6) 





= 401 
= 5 

4. (See page [3]) Apart from (+, x) and (U,+), what other semirings do you know, and what varia- 
tions on the "maximum segment sum" problem do they suggest? 

Solution: Here are a few more semirings: 

• (n, +) works just as well as (U, +) 

• booleans with (V, A) 

• sets with (U,n) 

• square matrices of integers, with addition and multiplication 

• indeed, square matrices with elements drawn from any semiring (such as the booleans) 

• Kleene algebras — eg languages (sets of strings) with union and concatenation 
However, semirings that are also lattices, such as (A, V) and (n,U), are not very inter- 
esting in this context. 

5. (See pagelH) Verify that the labelled variant of the usual datatype of lists (namely, List a = /X (F a) 
where shape functor F is given byFaj8 = l + axj3)isa datatype of nonempty lists. What is the 
labelled variant of externally-labelled binary trees, whose shape functor is Faj3 = a + ^ xjS? 
That of internally-labelled binary trees, whose shape functor is Faj3 = l + axj3x/3? And 
homogeneous binary trees, whose shape functor isFaj3 = a + axj8xj8? 

Solution: For lists, the construction in the text gives labelled variant La = /^(Ga) 
where Gaj8 = axFlj8 = ax(l + lxj8)~a + axj8, which does indeed give a 
datatype of nonempty lists. For each of the three kinds of binary tree mentioned, we 
end up with Gaj8 = a + axj8xj8, the shape of homogeneous binary trees. 

6. (See page |4l) If you're familiar with paramorphisms and with anamorphisms (unfolds), write 
subtennsf: and scan^ as instances of these. 

Solution: Paramorphisms are like catamorphisms (folds), except that at each step we 
have access to the original subterms as well as the result of the recursive call on those 
subterms. So whereas /oWp takes a body of type FajS — > j3, the paramorphism para^ 
takes a body of type Fa(j8xjU(Fa))— )-j3. This is convenient for subterms, because it 
saves us from having to reconstruct those original subterms using root: 
subterms^ = para^{inQ ■fork{in^ ■ fidsnd, F \fst)) 
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Sadly, it isn't so convenient for scan, because the original subterms that we have care- 
fully saved still have to be folded: 

scan^ f = fold^{inQ ■fork{f ■ fid (foldpf ■ snd), F Ifst)) 
Unfolds are the dual of folds; whereas the fold has type {f a [5 ^ p) ^ {lJ.{f a) ^ p), 
the unfold has type (j3 — )• F a j3 ) — )■ (j8 /i(Fa)), reversing two of the arrows. (Tech- 
nically, the unfold generates an element of the greatest fixpoint type v(Fa) instead of 
the least fixpoint /i(Fa); but in Haskell, based on complete partial orders, these co- 
incide.) Again, this works quite nicely for subterms: the root is a copy of the whole 
structure, and the children are generated recursively from children of the input. 

subterms f: = unfold q (fork(id, F\id ■ out)) 
However, it is rather inefficient to define scan as an unfold, because there is no sharing 
of common subexpressions: 

scanf f = unfold Qiforkifold^ f , f\id- out)) 

7. (See page|5]) Hand-simulate the execution of prune in the finite bag monad on a small homoge- 
neous binary tree, such as the term Fork {I, Leaf 2, Fork {3, Leaf I, Leaf 4)) of type 

data Tree a = Leaf a \ Fork (a, Tree a, Tree a) 

What happens on externally-labelled binary trees? Internally-labelled? How does the result differ 
if you let M be sets rather than bags? 

Solution: Let's define concrete syntax for the datatype of prunable homogeneous trees, 
which have one more constructor: 

data \J a = E \ La \ F {a,U a,\J a) 
Writing (. . .) for a bag, and making use of a "bag comprehension" notation {e\a <^x), 
we can unpack the definition of prune to reveal the following recursive characterization: 
prune {Leaf a) = {E,La) 

prune {Fork {a, t,u)) = {E)[tl {F {a, t' ,u') \t' -(^ prune t,u' -(^ prune u) 
In particular, the three-element subtree rooted at 3 has five prunings, and the whole 
five-element tree has eleven prunings: 

( E,F{l,E,E),F{l,E,F{3,E,E)),F{\,E,F{3,E,L4)),F{l,E,F{3,L\,E)), 
F {\,E,F {3,L\,L4)),F {l,L2,E),F {l,L2,F {3,E,E)), 
F {\,L2,F {3,E,L4)),F {l,L2,F {3,Ll,E)),F {\,L2,F {3,L\,L4)) ) 

8. (See page [5]) Pick a shape functor F and a collection monad M; give suitable definitions of 
/ :: FNN — > N to sum all naturals in an F-structure and k M N — )• N to find the maximum of a 
collection of naturals; and verify that the rectangle in Section[5]commutes. 

Solution: Here's how things work out for lists, using the bag monad, and the concrete 
syntax 

dataFaj8=A^|Caj8 
for the shape functor Faj3 = l + axjS. We define 

fN =0 

f{Cnm) = n + m 

to sum an F-structure of naturals, and let k = max, the obvious function to compute the 
maximum of a bag of naturals (returning 0, the minimum natural, for the empty bag). 
Using the bag comprehension notation from Exercise |7J the distributor 82 is given by 

52A^ =0 

52{Cnx) = {Cnm\m^^x) 
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We have to show that 

max • M / • ^2 = / • F idmax 
which we do by case analysis on the F-structured argument — for the nil case, 
max{Uf{&iN)) 
= { definition of ^2 } 

max(M/0) 
= { functors } 

= { definition of max } 



= { definition of / } 

fN 

= { functors } 

/ (fidmaxN) 
and for the cons case, 

max{Mf{52{Cnx))) 
= { definition of ^2 } 

max {M f {C nm \ m ^ x)) 
= { functors } 

max{f {Cnm) \m-^ x) 
= { definition of / } 

max {n + m\m^ x) 
= { addition distributes over maximum } 

n + maxx 
= { definition of / } 

f {Cn (maxx)) 
= { functors } 

/ (f idmax (Cnx)) 

(See page|6]) Given an M-algebra k, show that k distributes over tt) — there exists a binary operator 
© such that ^(xtt))') = ^x©^3'. (Hint: define a (Bb = k{retum a it) return b).) 

Solution: Defining © following the hint, we have: 

kx^ky 
= { definition of © } 

k (return (kx) 1+) return (ky)) 
= { naturality of return } 

k{Mk (return x) [tiMk (return y)) 
= { naturality of tt) } 

k(Mk (return x tt) return y) ) 
= {k a. monad algebra } 

k (join (return x tt) return y) ) 
= {join distributes over tt) } 

k (join (return x) kBjoin (return y)) 
= { monads } 

k(x^y) 



Jeremy Gibbons 



17 



10. (See page|6]) Given an M-algebra ©/, show that © is associative if tt) is; similarly for commuta- 
tivity and idempotence. Show also that if l+) has a unit 0, then © also has a unit, which must equal 
©/0. (Hint: show first that ©/is surjective.) 

Solution: Given the distributivity of ©/ over tt) from Exercise |9j it is straightforward 
to show that © is associative on the range of ©/, provided that t+J is itself associative: 

©A©(©/j©©/z) = ©/(.xa(jtt)z)) =©/((xl±)3;)tt)z) = (©A©©/j)©©/z 
Moreover, because ©/ is an M-algebra, we have ©/ -return = id and hence ©/is 
surjective — so © is in fact associative everywhere. (In other words, just replay the 
above calculation with x = return a etc.) The same argument works for commutativity, 
idempotence, and unit laws. 

11. (See page|6]) Using the universal property of fold ^, prove the fold fusion law 

h-foldpf =foldfg <= h-f = g-Fh 

Use this to prove the special case of fold-map fusion 

fold^f-Jg =fold^ (f-fgid) 

where Ja = /i(Fa). 

Solution: For the fold fusion law, we have: 





h-foldff=foldfg 




{ universal property } 




h -foldf f -in^ = g-?{h -foldf f) 




{ functors } 




h-foldpf ■inf:=g-fh-f (foldf f) 




{ evaluation rule } 




h-f-F(foldff)=g-Fh-F(foldff) 




{ Leibniz } 




h-f = g-fh 



The map operation is an instance of fold: with F now a bifunctor, we have 

Tg = foldf a {inFa • ^gid) 
Investigating the fusion condition of the body v/ith fold f: 

foldfaf-inva -^gid 
= { evaluation rule } 

f ■Fid{foldfaf)-Vgid 
= { functors } 

f-fgid-fidlfoldfaf) 
justifies the fold-map fusion law: 

foldfaf- = foldf a if ■ ^gid) 
12. (See page[6l) Use fold fusion to calculate a characterization of ©/ • H{fold^{maybeb f)) ■ prune 
as a fold, assuming the distributivity property ©/•M/-52=/-F/<i(©/). 
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Solution: 

©/ • M(fold^{g)) - prune ■ irif 
= { evaluation for prune } 

©/ • M{fold^{g)) ■ M /?iH • opt Nothing ■ MJust ■ 82 ■ fidprune 
= { functors; evaluation for fold ^ } 

®/ ■M{g - Hid {fold^{g))) - opt Nothing ■ MJust ■ &i ■ fid prune 
= {Uh-opta = opt{ha)-Uh} 

® / • opt {g Nothing ) • M (g • H / J (/o/J|_| (g) ) ) • M Just ■ 82 - fid prune 
= { functors; :: Fa -> H a } 

®l -opt {g Nothing) -Mg-M {Just ■ f id (fold^ (g)))- 82 -fid prune 
= {functors; §2 :: (Fa)M->M(Fa) } 

®l -opt {g Nothing )-M{g- Just) ■ 82 ■ f id {M(fold^{g))) ■ f id prune 
= {g = maybebf} 

®/ -optb-Uf ■ 82 - fid{M(fold^{g))) ■ fidprune 
{®/ ■optb = {b®)-®/] 

(b®)-®/ -Mf -82- fid{M(foldy^{g)))- fid prune 
= { distributivity : e/-M/-^ = /- F/<i(e/)} 

{b®)-f-fid{®/) ■ fid{M(foldH{g))) ■ fidprune 
= { functors } 

(be) • / • F /J (©/ • M (foldH (g)) ■ prune) 

Therefore, 

®/ ■U(fold^{maybebf))- prune = fold^{{b®) ■ f) 

13. (See page[6l) Show by calculation that 

©/ • M(foldy^{maybe b f)) ■ segs = ©/ • contents^ • scanf:{{b(B) •/) 
Solution: 

©/ • M (fold^ {maybe bf))- segs 
= { definition of segs } 

©/ • M(foldy^{maybe b f)) - join ■ M prune ■ contents\_ ■ subterms 
= { naturality of join :: MM -> M; functors } 

®/ • join ■ M{M(foldy^{maybe b f)) ■ prune) ■ contents^ ■ subterms 
= { ©/ is an M -algebra; functors } 

®/ ■ M(©/- M(fold^{maybe b f)) -prune) -contents^ -subterms 
= { naturality of contents : : L -> M } 

©/ -contents^ - L(ffi/ • M{foldy^{maybe b f)) -prune) -subterms 
= { Homer's Rule } 

©/ • contentsi - L(foldf{{b®) •/)) • subterms 
= { Scan Lemma } 

©/ -contentsi - scanf:{{bQ) - f) 

14. (See page [H) Convert the big commuting diagram in Section |7] into an equational proof of the 
distributivity property ®/ -Mf -82 = f ■fid{(B/), assuming the properties captured by each of the 
individual faces. 
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Solution: 

e/-M/-52 

= { defining 82 in terms of df } 

®/ -Uf ■ 5^ - Vreturnid 
= { defining / in terms of contents; functors } 

©/ • M (foldr ((g)) Z?) ■ M contents^ ■ 5f • F return id 
= {8 respects contents } 

©/ • M (foldr {®)b)- ^List • contents^ ■ F return id 
= { distributivity for lists (Exercise [T5l) } 

foldr {®)b- List(©/) - contents^ ■ fretumid 
= { naturality of contents } 

foldr {(^)b ■ contents^ • F (©/) (©/) • freturnid 
= { functors, monad algebras } 

foldr (©) b ■ contents f ■ F id (© /) 
= { defining / in terms of contents } 

f-fid{e/) 

15. (See page[8]) Use fold fusion to prove that 

foldr (0) b ■ List (©/) = ©/ • M (foldr (0) b) ■ 5ust 

assuming the distributivity property ©/ • M ((g) • cp = (©) • (©/) x (©/) of cp. 
Solution: For the base case we have 

1^ / (M (foldr ((S>) b) (return[])) = @ / (return (foldr (®)b[])) =®/(retumb) =b 
as required. For the inductive step, we have: 

@/ ■U(foldr(®)b)-U(:)-cp 
= { functors } 

®/ ■U(foldr(®)b- (■?,)■ cp 
= { evaluation fox foldr } 

©/• M ((®)-id -x foldr (®)b)-cp 
= { functors; naturality of cp } 

©/ • M (©) • c;? • M /<i X M (foldr (©) b) 
= { distributivity for cp: see below } 

(0)-(ffi/) X (®/) -U id y.U (foldr (®)b) 
= { functors } 

((»)•(©/) xid-idx M(e/ -foldr ((E)) b) 

16. (See page|9l) Prove the distributivity property for cartesian product 

©/ ■M((E))-cp = (©) • (©/) X (©/) 

assuming the two distributivity properties ©/ • M (a(E) = (a©) • ©/ and ©/ • M (©ft) = (©ft) 
for collections. 

Solution: Writing $ for right- associative, loosest-binding application, to reduce paren- 
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theses, we have: 

e/$M{(g))$cp{x,y) 
= { definition of cp } 

e/SM ((g))Sjo/?iSM (Ac?. M ia,)y)x 
= { naturality } 

Q/SjoinSM {Xa . M {a(S))y)x 
= { ©/ is an M-algebra } 

= { functors } 

e/$M{Xa.®/{M{a(g))y))x 
= { distributivity for collections: see below } 

®/$M{Xa.aiSii®/y))x 
= { sectioning } 

®/$M{(g){(B/y))x 
= { distributivity for collections again } 

m(B/y)){®/x) 
= { sectioning } 

{(S)/x)(g>{(B/y) 
= { eta-expansion } 

{(S>)${e/ x(B/)${x,y) 

(See page|9]) Prove the two distributivity properties for collections 

©/•M(a(g)) = (a®)-e/ 
e/-M((8)^) = l^b)-®/ 

by induction over the size of the (finite!) collection, assuming that binary operator (g) distributes 
over © in the familiar sense (that is, a © (ft © c) = {a b) Q) {a c)). 

Solution: The base cases are for empty and singleton collections; the case for the 
empty collection follows from e® being a zero of (g) (that is, e® x a = e© = a x for 
any a), and the case for singleton collections, ie those in the image of return, follows 
from the fact that ©/ is an M-algebra. The inductive step is for a collection of the form 
mI±)v with M,v both strictly smaller than the whole (so, if the monad is idempotent, we 
should use disjoint union, or at least not the trivial union of a collection with one of 
its subcoUections); this requires the distribution of the algebra over choice ©/(m l+) v) = 
(©/m) © (©/v), together with the familiar distribution of (g over ©. 



